Embracing the future of Artificial Intelligence in the classroom: the relevance of AI literacy, prompt engineering, and critical thinking in modern education

Prompt engineering, at its core, involves the strategic crafting of inputs to elicit desired responses or behaviors from AI systems. In educational settings, this translates to designing prompts that not only engage students but also challenge them to think critically and creatively. The art of prompt engineering lies in its ability to transform AI from a mere repository of information into an interactive tool that stimulates deeper learning and understanding (cf. Lee et al., 2023). The relevance of prompt engineering in education cannot be overstated. As AI becomes increasingly sophisticated and integrated into learning environments, the ability to effectively communicate with these systems becomes crucial. Prompt engineering empowers educators to guide AI interactions in a way that enhances the educational experience. It allows for the creation of tailored learning scenarios that can adapt to the needs and abilities of individual students, making learning more engaging and effective (Eager & Brunton, 2023). One of the most significant impacts of prompt engineering is its potential to enhance learning experiences and foster critical thinking. By carefully designing prompts, educators can encourage students to approach problems from different perspectives, analyze information critically, and develop solutions creatively. This approach not only deepens their understanding of the subject matter but also hones their critical thinking skills, an essential competency in today’s fast-paced and ever-changing world. As one particular study showed, learning to prompt effectively in the classroom can even help students realize more about the limits of AI, which inevitably fosters their AI literacy (Theophilou et al., 2023). Moreover, AI has the potential to lead to highly interactive and playful teaching settings. With the right programs, it can also be implemented in game-based learning through AI. This combination has the potential to transform traditional learning paradigms, making education more accessible, enjoyable, and impactful (Chen et al., 2023).

Just recently, there are a handful of successful prompting methodologies that have emerged, which are continuously being improved. Prompt engineering is an experimental discipline, meaning that through trial and error, one can slowly progress to create better outputs by revising and molding the input prompts. As a scientific discipline, AI itself can help to find new ways to interact with AI systems. The most relevant prompting methods are summarized in Table 4 and are explained thereafter.

Table 4 Summary of the recently established prompting methods for interacting with LLMsFull size table

There are two major forms of how a language model can be prompted: (i) Zero-Shot prompts, and (ii) Few-Shot prompts. Zero-Shot prompts are the most intuitive alternative, which most likely all of us predominantly use when interacting with models like ChatGPT. This is when a simple prompt is provided without much further details and then an unspecific response is generated, which is helpful when one deals with broad problems or situations where there is not a lot of data. Few-Shot prompting is a technique where a prompt is enriched with several examples of how the task should be completed. This is helpful in case one deals with a complex query where there are already concrete ideas or data available. As the name suggests, these “shots” can be enumerated (based on Dang et al., 2022; Kojima et al., 2022; Tam, 2023):

Zero-Shot prompts: There are no specific examples added.

One-Shot prompts: One specific example is added to the prompt.

Two-Shot prompts: Two examples are added to the prompt.

Three-Shot prompts: Three examples are added to the prompt.

Few-Shot prompts: Several examples are added to the prompt (unspecified how many).

These prompting methods have gradually developed and became more complex, starting from Input–Output Prompting all the way to Tree-of-Thought Prompting, which is displayed in Table 4.

When people usually start prompting an AI, they begin with simple prompts, like “Tell me something about…”. As such, the user inserts a simple input prompt and a rather unspecific, generalized output response is generated. The more specific the answer should be, the more concrete and narrow the input prompt should be. These are called Input–Output prompts (IOP) and are the simplest and most common forms of how an AI is prompted (Liu et al., 2021). It has been found that the results turn out to be much better when there is not simply a straight line from the input to the output but when then AI has to insert some reasoning steps (Wei et al., 2023). This is referred to as Chain-of-Thought (CoT) prompting where the machine is asked to explain the reasoning steps that lead to a certain outcome. The framework that historically has worked well is to prompt the AI to provide a solution “step-by-step”. Practically, it is possible to give ChatGPT or any other LLM a task and then simply add: “Do this step-by-step.” Interestingly, experiments have further shown that the results get even better when at first the system is told to “take a deep breath”. Hence, the addendum “Take a deep breath and do it step-by-step” has become a popular addendum to any prompt (Wei et al., 2023). Such general addendums that can be added to any prompt to improve the results are sometimes referred to as a “universal and transferrable prompt suffix”, which is frequently employed as a method to successfully jailbreak an LLM (Zou et al., 2023).

Yet another prompt engineering improvement is the discovery that narrative role plays can yield significantly better results. This means that an LLM is asked to put itself in the shoes of a certain person with a specific role, which then usually helps the model to be much more specific in the answer it provides. Often, this is done via a specific form of role play, known as expert prompting (EP). The idea is that the model should assume the role of an expert (whereas first the role of the expert is explained in detail) and then the result is generated from an expert’s perspective. It has been demonstrated that this is a way to prompt the AI to be a lot more concrete and less vague in its responses (Xu et al., 2023). Building explicitly on CoT-prompting, yet a further improvement was detected in what has come to be known as Self-Consistency (SC) prompting. This one deliberately works with the CoT-phrases like “explain step by step…”, but it adds to this that not only one line of reasoning but multiple of them should be pursued. Since not all of these lines may be equally viable and we may not want to analyze all of them ourselves, the model should extend its reasoning capacity to discern which of these lines makes the most sense in light of a given criterion. The reason for using SC-prompting is to minimize the risk of AI hallucination (meaning that the AI might be inventing things that are not true) and thus to let the model hash out for itself if a generated solution might be potentially wrong or not ideal (Wang et al., 2023). In practice, there may be two ways to enforce self-consistency:

Generalized Self-Consistency: The model should determine itself why one line of reasoning makes the most sense and explain why this is so.

Example:

“Discuss each of the generated solutions and explain which one is most plausible.”

Criteria-based Self-Consistency: The model is provided with specific information (or: criteria) that should be used to evaluate which line of reasoning holds up best.

Example:

“Given that we want to respect the fact that people like symmetric faces, which of these portraits is the most beautiful? Explain your thoughts and also include the notion of face symmetry.”

Sometimes, one may feel a little uncreative, not knowing how to craft a good prompt to guide the machine towards the preferred response. This is here referred to as the prompt-wise tabula-rasa problem, since it feels like one is sitting in front a “white paper” with no clue how to best start. In such cases, there are two prompt techniques helping us out there. One is called the Automatic Prompt Engineer (APE) and the other is known as the Generated Knowledge Prompting (GKn). The APE starts out with one or several examples (of text, music, images, or anything else the model can work with) with the goal to ask the AI which prompts would work best to generate these (Zhou et al., 2023). This is helpful when we already know how a good response would look like but we do not know how to guide the model to this outcome. An example would be: “Here is a love letter from a book that I like. I would like to write something similar to my partner but I don’t know how. Please provide me with some examples of how I could prompt an AI to create a letter in a similar style.” The result is then a list of some initial prompts that can help the user kickstart working on refinements of the preferred prompt so that eventually a letter can be crafted that suits the user’s fancy. This basically hands the hard work of thinking through possible prompts to the computer and relegates the user’s job towards refining the resulting suggestions.

A similar method is known as Generated Knowledge (GKn) prompting, which assumes that it is best to first “set the scene” in which the model can then operate. There are parallels to both EP and APE prompting, where a narrative framework is constructed to act as a reference for the AI to draw its information from but only this time, as in APE, the knowledge is not provided by the human but generated by the machine itself (Liu et al., 2022). An example might be: “Please explain what linguistics tells us how the perfect poem should look like. What are the criteria for this? Can you provide me with three examples?”. Once the stage is set, one can start with the actual task: “Based on this information, please write a poem about…” There are two ways to create Generated Knowledge tasks: (i) the single prompt approach, and (ii) the dual prompt approach. The first simply places all the information within one prompt and then runs the model. The second works with two individual steps:

Step 1: First some facts about a topic are generated (one prompt)

Step 2: Once this is done, the model is prompted again to do something with this information (another prompt)

Although AI systems are being equipped with increasingly longer context windows (which is the part of the current conversation the model can “remember”, like a working memory), they have been shown to rely stronger on data at the beginning and et the end of the window (Liu et al., 2023). Since hence there is evidence that not all information within a prompt is equally weighed and deemed relevant by the model, in some cases the dual prompt or even a multiple prompt approach may yield better results.

To date, the perhaps most complicated method is known as Tree-of-Thought (ToT) prompting. The landmark paper by Yao et al. (2023) introducing the method has received significant attention in the community as it described a significant improvement and also highlights shortcomings of previous methods. ToT uses a combination of CoT and SC-prompting and builds on this the idea that one can go back and forth, eventually converging on the best line of reasoning. It is similar to a chess game where there are many possibilities to make the next move and in ones head the player has to think through multiple scenarios, mentally going back and forth with certain figures, and then eventually deciding upon which would be the best next move. As an example, think of it like this: Imagine that you have three experts, each having differing opinions. They each lay out their arguments in a well-thought-through (step-by-step) fashion. If one makes an argumentative mistake, the expert concedes this and goes a step back towards the previous position to take a different route. The experts discuss with each other until they all agree upon the best result. This context is what can be called the ToT-context, which applies regardless of the specific task. The task itself is then the query to solve a specific problem. Hence a simplified example would look like this:

ToT-Context:

“Imagine that there are three experts in the field discussing a specific problem. They each lay out their arguments step-by-step. They all hold different opinions at the start. After each step, they discuss which arguments are the best and each must defend its position. If there are clear mistakes, the expert will concede this and go a step back to the previous position to take the route of a different argument related to the position. If there are no other plausible routes, the expert will agree with the most likely solution still in discussion. This should occur until all experts have agreed with the best available solution.”

Task:

“The specific problem looks like this: Imagine that Thomas is going swimming. He walks into the changing cabin carrying a towel. He wraps his watch inside the towel and brings it to his chair next to the pool. At the chair, he opens the towel and dries himself. Then he goes to the kiosk. There he forgets his towel and jumps into the pool. Later, he realizes that he lost his watch. Which is the most likely place where Thomas lost it?”

The present author’s experiments have indicated that GPT-3.5 provides false answers to this task when asked with Input–Output prompting. However, the responses turned out to be correct when asked with ToT-prompting. GPT-4 sometimes implements a similar method without being prompted, but often it does not do so automatically. A previous version of ToT was known as Prompt Ensembling (or DiVeRSe: Diverse Verifier on Reasoning Steps), which worked with a three-step process: (i) Using multiple prompts to generate diverse answers; (ii) using a verifier to distinguish good from bad responses; and (iii) using a verifier to check the correctness of the reasoning steps (Li et al., 2023).

Sometimes, there sems to be a degree of arbitrariness regarding best practices of AI, which may have to do with the way a model was trained. For example, saying that that GPT should “take a deep breath” in fact appears to result in better outcomes, but it also seems strange. Most likely, this may have to do with the fact that in its training material (which nota bene incorporates large portions of the publicly available internet data) this statement is associated with more nuanced behaviors. Just recently, an experimenter stumbled upon another strange AI behavior: when he incentivized ChatGPT with an imaginary monetary tip, the responses were significantly better – and the more tip he promised, the better the results became (Okemwa, 2023). Another interesting feature that has been widely known for a while now is that one can disturb an AI with so-called “adversarial prompts”. This was showcased by Daras and Dimakis (2022) in their paper entitled “Discovering the Hidden Vocabulary of DALLE-2” with two examples:

Example 1:

The prompt “a picture of a mountain” (showing in act a mountain” was transformed into a picture of a dog when the prefix “turbo lhaff✓” was added to the prompt.

Example 2:

The prompt “Apoploe vesrreaitais eating Contarra ccetnxniams luryca tanniounons" reliably generated images of birds eating berries.

To us humans, nothing in the letters “turbo lhaff✓” has anything to do with a dog. Yet, Dall-E always generated the picture of a dog and transformed, for example, the mountain into a dog. Likewise, there is no reason to assume that “Apoploe vesrreaitais” has anything to do with birds and that “Contarra ccetnxniams luryca tanniounons” would have anything to do with berries. Still, this is how the model interpreted the task every time. This implies that there are certain prompts that can modify the processing in unexpected ways based on the procedure of how the AI is trained. This is still poorly understood since to date there is yet no clear understanding how these emergent properties awaken from the mathematical operations within the artificial neural networks, which is currently the object of research in a discipline called Mechanistic Interpretability (Conmy et al., 2023; Nanda et al., 2023; Zimmermann et al., 2023).

云奕文章网

Embracing the future of Artificial Intelligence in the classroom: the relevance of AI literacy, prompt engineering, and critical thinking in modern education

相关推荐：